Vice Data Analytics

Endri Raco

2022-04-25

Introduction

The aim of this report is to answer following questions using data techniques:

  1. How company content strategy has shifted over time.

  2. Are all kinds of engagement beneficial for video popularity? Naturally, a more popular video will have more reactions of all kinds, but does a higher fraction of, say, “Angry” reactions, have a negative effect on video performance?

  3. Are there any topics, word combinations which always perform higher than average, or have been successful as of recently?

We will use dataset vice_data_for_test_task This dataset contains Facebook video data from the past three years. The data concerns posts from four pages belonging to VICE.

For all project calculations is used the following PC:

print('Operating System:')
## [1] "Operating System:"
version
##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          4                           
## minor          1.2                         
## year           2021                        
## month          11                          
## day            01                          
## svn rev        81115                       
## language       R                           
## version.string R version 4.1.2 (2021-11-01)
## nickname       Bird Hippie

 

Data preparation

Importing data

data_path <- here("data", "vice_data_for_test_task.csv")
vice_data <- read_csv(data_path)

A first glimpse

First, we make a check if our data format is indeed data frame:

 

# Check format
class(vice_data)
## [1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

We see that vice_data data frame has 18497 rows and 37 variables.

 

Now let’s check the structure of vice_data data frame

# Check structure
glimpse(vice_data)
## Rows: 18,497
## Columns: 37
## $ `Page Name`                      <chr> "VICE News", "VICE News", "VICE", "VI~
## $ `User Name`                      <chr> "vicenews", "vicenews", "VICE", "vice~
## $ `Facebook Id`                    <dbl> 236000000000000, 236000000000000, 167~
## $ `Page Category`                  <chr> "MEDIA_NEWS_COMPANY", "MEDIA_NEWS_COM~
## $ `Page Admin Top Country`         <chr> "US", "US", "US", "US", "US", "US", "~
## $ `Page Description`               <chr> "VICE News Tonight airs Monday–Thursd~
## $ `Page Created`                   <chr> "2014-02-23 19:00:02 EST", "2014-02-2~
## $ `Likes at Posting`               <dbl> 3339049, 3339049, 8312112, 3339023, 8~
## $ `Followers at Posting`           <chr> "4342864", "4342864", "9754669", "434~
## $ `Post Created`                   <chr> "2021-05-26 04:00:18 EDT", "2021-05-2~
## $ Type                             <chr> "Native Video", "Native Video", "Nati~
## $ `Total Interactions`             <dbl> 54, 41, 66, 351, 24, 132, 358, 139, 7~
## $ Likes                            <dbl> 34, 23, 19, 77, 12, 35, 151, 36, 15, ~
## $ Comments                         <dbl> 4, 5, 5, 126, 6, 54, 79, 44, 21, 53, ~
## $ Shares                           <dbl> 8, 8, 8, 60, 1, 21, 48, 21, 15, 12, 2~
## $ Love                             <dbl> 6, 1, 6, 5, 0, 1, 1, 1, 1, 13, 1, 7, ~
## $ Wow                              <dbl> 2, 0, 0, 8, 0, 2, 9, 1, 0, 1, 1, 1, 0~
## $ Haha                             <dbl> 0, 2, 3, 19, 2, 15, 58, 22, 5, 23, 40~
## $ Sad                              <dbl> 0, 1, 22, 3, 0, 2, 5, 11, 10, 0, 1, 1~
## $ Angry                            <dbl> 0, 1, 1, 52, 0, 2, 5, 1, 0, 1, 2, 0, ~
## $ Care                             <dbl> 0, 0, 2, 1, 3, 0, 2, 2, 8, 2, 1, 0, 1~
## $ `Video Share Status`             <chr> "crosspost", "crosspost", "crosspost"~
## $ `Is Video Owner?`                <chr> "Yes", "No", "No", "No", "No", "No", ~
## $ `Post Views`                     <dbl> 3213, 1745, 7268, 8294, 2761, 25601, ~
## $ `Total Views`                    <dbl> 3214, 1752, 7273, 8375, 2761, 25672, ~
## $ `Total Views For All Crossposts` <dbl> 1793907, 13838, 81146, 10240, 129914,~
## $ `Video Length`                   <chr> "0:17:38", "0:09:21", "0:24:57", "0:0~
## $ URL                              <chr> "https://www.facebook.com/23585288990~
## $ Message                          <chr> "Tattoos are stigmatized in Japan bec~
## $ Link                             <chr> "https://www.facebook.com/vicenews/vi~
## $ `Final Link`                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ `Image Text`                     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ `Link Text`                      <chr> "Inside the Underground Pilgrimage Th~
## $ Description                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ `Sponsor Id`                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ `Sponsor Name`                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~
## $ `Sponsor Category`               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, N~

It is a good idea to check for dublicates in rows so to create a general idea about real amount of data.

 

# Distinct users, movies, genres
nrow(vice_data %>% distinct())
## [1] 18497

 

Let’s repair the names of variables:

# Name repair
vice_data_cl <- janitor::clean_names(vice_data)

Now time for checking problems in dataset previous turning to data analysis

diagnose(vice_data_cl)
## # A tibble: 37 x 6
##    variables        types missing_count missing_percent unique_count unique_rate
##    <chr>            <chr>         <int>           <dbl>        <int>       <dbl>
##  1 page_name        char~             0               0            3   0.000162 
##  2 user_name        char~             0               0            4   0.000216 
##  3 facebook_id      nume~             0               0            4   0.000216 
##  4 page_category    char~             0               0            2   0.000108 
##  5 page_admin_top_~ char~             0               0            1   0.0000541
##  6 page_description char~             0               0            4   0.000216 
##  7 page_created     char~             0               0            4   0.000216 
##  8 likes_at_posting nume~             0               0         3906   0.211    
##  9 followers_at_po~ char~             0               0         3899   0.211    
## 10 post_created     char~             0               0        18320   0.990    
## # ... with 27 more rows

Data Wrangling

When we diagnosed vice_data_cl data frame we noticed that final_link, image_text, description, sponsor_id, sponsor_name, sponsor_category variables have more than \(90\%\) missing data. Also we can notice that page_admin_top_country variables has a single value US so it will not be included in analytics. Let’s remove these variables

vice_data_cl <- vice_data_cl %>% select(-c('final_link', 'image_text', 'description', 'sponsor_id', 'sponsor_name', 'sponsor_category'))

Next step is to turn our two variables page_created and post_created to the right date-time format. We will use Vilnius timezone where company is located.

vice_data_cl$page_created <- as.POSIXct(vice_data_cl$page_created, tz = 'Europe/Vilnius')
vice_data_cl$post_created <- as.POSIXct(vice_data_cl$post_created, tz = 'Europe/Vilnius')
vice_data_cl$video_length <- lubridate::period_to_seconds(lubridate::hms(vice_data_cl$video_length))

Analytics

Question 1. Based on the data, comment on how VICE’s content strategy has shifted over time. You are free to focus on just a few aspects of your choice.

We’ll walk through several video metrics to answer question 1.

Post Creation

Posting by year

vice_data_cl %>%  
  mutate(year = lubridate::year(post_created)) %>% 
  group_by(year) %>% summarise(freq = n()) -> year_freqs 
ggplot(year_freqs, aes(x=year, y=freq)) +
  geom_bar(fill = 'green', stat='identity') 

Posting by month

vice_data_cl %>%  
  mutate(year = lubridate::year(post_created)) %>% 
  mutate(month = lubridate::month(post_created, label=TRUE)) %>%   
  group_by(year, month) %>% 
  summarise(freq = n())  -> month_freqs 
# subset 2 months around flood
month_freqs %>%
  ggplot(aes(x = month, y = freq)) +
  geom_bar(stat = "identity", fill = "darkorchid4") +
  facet_wrap(~ year, ncol = 1) +
  labs(title = "Monthly Video Postings")

Posting by day

vice_data_cl %>%  
  mutate(year = lubridate::year(post_created)) %>% 
  mutate(day = lubridate::date(post_created)) %>% 
  group_by(year, day) %>% 
  summarise(freq = n())  -> day_freqs
ggplot(day_freqs, aes(x = day, y = freq)) + 
  geom_line(aes(color = factor(year))) 

Frequency of Daily Posting

source("https://raw.githubusercontent.com/iascchen/VisHealth/master/R/calendarHeat.R")
vcl <- vice_data_cl %>%  
  select(post_created) %>%  
  group_by(post_created) %>% 
  summarise(freq = n()) 

r2g <- c("#D61818", "#FFAE63", "#FFFFBD", "#B5E384")
calendarHeat(vcl$post_created, vcl$freq, ncolors = 99, color = "r2g", varname="Frequency of Daily Posting")

#### Monthly Average of Daily POsts

vlc <- vice_data_cl %>%  
select(post_created) %>% 
count(post_created)  %>%
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"))


by_month <- vlc %>%
  group_by(Start.Month) %>%           
  summarise(av_posts = mean(n)) 

ggplot( data = by_month, 
aes(x = Start.Month, y = av_posts, fill=as.factor(lubridate::year(Start.Month)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Monthly Average of Daily Posts", x=NULL,  y="Number of Posts") + 
  theme_minimal() +
  theme(legend.position = "none") 

Weekly Average of Daily POsts

vlc <- vice_data_cl %>%  
select(post_created) %>% 
count(post_created)  %>%
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"))


by_week <- vlc %>%
  group_by(Start.Week) %>%           
  summarise(av_posts = mean(n)) 

ggplot( data = by_week, 
aes(x = Start.Week, y = av_posts, fill=as.factor(lubridate::year(Start.Week)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Weekly Average of Daily Posts", x=NULL,  y="Number of Posts") + 
  theme_minimal() +
  theme(legend.position = "none") 

Page Posting Over Time

vice_data_cl %>%  select(post_created, page_name) %>% 
  group_by(post_created, page_name) %>% 
  summarise(freq = n()) %>%
  spread(key=page_name, value=freq) %>%
  select(-post_created) %>%
  ts_plot( title = "Page Posting over Time",
          Xtitle = "Time",
          Ytitle = "Number of Posts")

Monthly Average of Page Posting Over Time

vlc<- vice_data_cl %>%  
select(post_created, page_name) %>% 
group_by(post_created, page_name) %>%  
count(post_created)  %>%
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"))


vlc %>%
  group_by(Start.Month, page_name) %>%           
  summarise(av_posts = mean(n)) %>%
  spread(key=page_name, value=av_posts) %>%
  ts_plot( title = "Page Posting over Time",
          Xtitle = "Time",
          Ytitle = "Number of Posts")

Weekly Average of Page Posting Over Time

vlc<- vice_data_cl %>%  
select(post_created, page_name) %>% 
group_by(post_created, page_name) %>%  
count(post_created)  %>%
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"))


vlc %>%
  group_by(Start.Week, page_name) %>%           
  summarise(av_posts = mean(n)) %>%
  spread(key=page_name, value=av_posts) %>%
  ts_plot( title = "Page Posting over Time",
          Xtitle = "Time",
          Ytitle = "Number of Posts")

Daily Post Views Over Time

View count is the total number of people who have viewed your video.

Facebook measure a view by checking if someone views your video for 3 seconds (same for Live videos)

View count can be considered more of a vanity metric, as the number of views don’t really affect your bottom line if no other action is taken. However, this still shows us that we need to make those first 3-30 seconds hyper-engaging in order to reel a viewer in.

don <- xts(x = vice_data_cl$post_views, order.by = vice_data_cl$post_created)
# Finally the plot
p <- dygraph(don, main = "Post Views Over Time", 
        ylab = "Number of Views") %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1) 
p 

Daily Total Views Over Time

don <- xts(x = vice_data_cl$total_views, order.by = vice_data_cl$post_created)
# Finally the plot
p <- dygraph(don, main = "Total Views Over Time", 
        ylab = "Number of Views") %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1) 
p 

Monthly Average of Views Over Time

vlc <- vice_data_cl %>%  
select(post_created, post_views) %>%
group_by(post_created) %>%  
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"))


by_month <- vlc %>%
  group_by(Start.Month) %>%           
  summarise(av_views = mean(post_views)) 

ggplot( data = by_month, 
aes(x = Start.Month, y = av_views, fill=as.factor(lubridate::year(Start.Month)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Monthly Average of Views Over Time", x=NULL,  y="Number of Views") + 
  theme_minimal() +
  theme(legend.position = "none") 

Weekly Average of Views Over Time

vlc <- vice_data_cl %>%  
select(post_created, post_views) %>%
group_by(post_created) %>%  
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"))


by_week <- vlc %>%
  group_by(Start.Week) %>%           
  summarise(av_views = mean(post_views)) 

ggplot( data = by_week, 
aes(x = Start.Week, y = av_views, fill=as.factor(lubridate::year(Start.Week)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Weekly Average of Views Over Time", x=NULL,  y="Number of Views") + 
  theme_minimal() +
  theme(legend.position = "none") 

Total Views for all Crossposts Over Time

don <- xts(x = vice_data_cl$total_views_for_all_crossposts, order.by = vice_data_cl$post_created)
# Finally the plot
p <- dygraph(don, main = "Total Views for all Crossposts Over Time", 
        ylab = "Number of Views") %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1) 
p 

Monthly Average of Total Views for all Crossposts Over Time

vlc <- vice_data_cl %>%  
select(post_created, total_views_for_all_crossposts) %>%
group_by(post_created) %>%  
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"))


by_month <- vlc %>%
  group_by(Start.Month) %>%           
  summarise(av_views = mean(total_views_for_all_crossposts)) 

ggplot( data = by_month, 
aes(x = Start.Month, y = av_views, fill=as.factor(lubridate::year(Start.Month)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Monthly Average of Total Views for all Crossposts Over Time", x=NULL,  y="Number of Views") + 
  theme_minimal() +
  theme(legend.position = "none") 

Weekly Average of Total Views for all Crossposts Over Time

vlc <- vice_data_cl %>%  
select(post_created, total_views_for_all_crossposts) %>%
group_by(post_created) %>%  
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"))


by_week <- vlc %>%
  group_by(Start.Week) %>%           
  summarise(av_views = mean(total_views_for_all_crossposts)) 

ggplot( data = by_week, 
aes(x = Start.Week, y = av_views, fill=as.factor(lubridate::year(Start.Week)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Weekly Average of Total Views for all Crossposts Over Time", x=NULL,  y="Number of Views") + 
  theme_minimal() +
  theme(legend.position = "none") 

Total Views vs Total Views for All Crossposts Overtime

vice_data_cl %>%  select(post_created, total_views, total_views_for_all_crossposts) %>% 
  group_by(post_created) %>% 
  ts_plot(title = " Total views vs Total Views for All crossposts Over Time",
          Xtitle = "Time",
          Ytitle = "Frequency")

Monthly Average of Total Views vs Total Views for All Crossposts Overtime

vlc <- vice_data_cl %>%  
select(post_created, total_views, total_views_for_all_crossposts) %>%
group_by(post_created) %>%  
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"))

vlc %>%
  group_by(Start.Month) %>%           
  summarise(av_tot_views = mean(total_views), av_tot_cviews = mean(total_views_for_all_crossposts)) %>%
  ts_plot(title = " Monthly Average of Total Views vs Total Views for All Crossposts Overtime",
          Xtitle = "Time",
          Ytitle = "Frequency")

Weekly Average of Total Views vs Total Views for All Crossposts Overtime

vlc <- vice_data_cl %>%  
select(post_created, total_views, total_views_for_all_crossposts) %>%
group_by(post_created) %>%  
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"))

vlc %>%
  group_by(Start.Week) %>%           
  summarise(av_tot_views = mean(total_views), av_tot_cviews = mean(total_views_for_all_crossposts)) %>%
  ts_plot(title = " Weekly Average of Total Views vs Total Views for All Crossposts Overtime",
          Xtitle = "Time",
          Ytitle = "Frequency")

Video Length

Posted Video Length

vice_data_cl %>%
  filter( video_length < 1200 ) %>%
  ggplot( aes(x= video_length)) +
  geom_histogram( binwidth=10, fill="#69b3a2", color="#e9ecef", alpha=0.9) +
  ggtitle("Histogram of Posted Video Length ") +
  theme_ipsum() +
  theme(
    plot.title = element_text(size=15)
  ) + 
  scale_y_continuous(breaks=seq(0,1000,50)) + 
  scale_x_continuous(breaks=seq(0,1200,100))

Length of Video Posts in Time

vice_data_cl %>%  select(post_created, video_length) %>%
  filter( video_length < 1200 ) %>%
  mutate(year = lubridate::year(post_created)) %>%
select(year, video_length) %>%
ggplot(aes(x=video_length, fill = as.factor(year)))+
  geom_histogram( color='#e9ecef', alpha=0.6) + 
labs(title = "Posted Video Lengths in Years") +
xlab('Video Length') +
ylab('Frequency of Video Posts') +
guides(fill=guide_legend(title="Years"))  

Monthly Average of Video Length

vlc <- vice_data_cl %>%  
select(post_created, video_length) %>% 
group_by(post_created)  %>%
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"))


by_month <- vlc %>%
  group_by(Start.Month) %>%           
  summarise(av_length = mean(video_length)) 

ggplot( data = by_month, 
aes(x = Start.Month, y = av_length, fill=as.factor(lubridate::year(Start.Month)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Monthly Average of Video Length", x=NULL,  y="Video Length") + 
  theme_minimal() +
  theme(legend.position = "none") +
scale_y_continuous(breaks=seq(0, 1000,100), limits=c(0,1000))

Weekly Average of Daily POsts

vlc <- vice_data_cl %>%  
select(post_created, video_length) %>% 
group_by(post_created)  %>%
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"))


by_week <- vlc %>%
  group_by(Start.Week) %>%           
  summarise(av_length = mean(video_length)) 

ggplot( data = by_week, 
aes(x = Start.Week, y = av_length, fill=as.factor(lubridate::year(Start.Week)))) +
geom_col() +  
scale_fill_brewer(palette = "Paired") +
labs(title="Weekly Average of Video Length", x=NULL,  y="Video Length") + 
  theme_minimal() +
  theme(legend.position = "none") +
scale_y_continuous(breaks=seq(0, 1000,100), limits=c(0,1000))

Engagement

Video engagement includes the comments and likes that video content generates.

It’s a good idea to see how many people are actually taking action on your video, but more than that, company pay attention to the types of comments is getting.

Daily User Activity

don <- xts(x = vice_data_cl$total_interactions, order.by = vice_data_cl$post_created)
# Finally the plot
p <- dygraph(don, main = "Total Interactions Over Time", 
        ylab = "Number of Views") %>%
  dyOptions(labelsUTC = TRUE, fillGraph=TRUE, fillAlpha=0.1, drawGrid = FALSE, colors="#D8AE5A") %>%
  dyRangeSelector() %>%
  dyCrosshair(direction = "vertical") %>%
  dyHighlight(highlightCircleSize = 5, highlightSeriesBackgroundAlpha = 0.2, hideOnMouseOut = FALSE)  %>%
  dyRoller(rollPeriod = 1) 
p 

Daily User Activity

vice_data_cl %>%  select(post_created, total_interactions) %>%
filter( total_interactions < 5000 ) %>%
mutate(year = lubridate::year(post_created)) %>%
select(year, total_interactions) %>%
ggplot(aes(x=total_interactions, fill = as.factor(year)))+
geom_histogram( binwidth=200,color="#e9ecef", alpha=0.9) +
ggtitle("Histogram of Total Interactions During Years ") +
theme_ipsum() +
theme(
plot.title = element_text(size=15)
) +
xlab('Total Interactions') +
ylab('Frequency of Total Interactions') +
guides(fill=guide_legend(title="Years"))   

Relationship between different user reactions

vice_data_cl %>% 
select(likes, comments, shares, love, wow, haha, sad, angry, care) %>%
ggpairs()  

Comparision of weekly user interaction rates

vlc <- vice_data_cl %>%  
select(post_created, likes, comments, shares, love, wow, haha, sad, angry, care, total_interactions) %>%
group_by(post_created) %>%  
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"),
       like_ratio = likes/total_interactions,
       comments_ratio = comments/total_interactions,
       shares_ratio = shares/total_interactions,
       love_ratio = love/total_interactions,
       wow_ratio = wow/total_interactions,
       haha_ratio = haha/total_interactions,
       sad_ratio = sad/total_interactions,
       angry_ratio = angry/total_interactions,
       care_ratio = care/total_interactions) %>%
select(post_created, like_ratio, comments_ratio, shares_ratio, love_ratio, wow_ratio, haha_ratio, sad_ratio, angry_ratio, care_ratio, Start.Week)  

vlc %>%
  group_by(Start.Week) %>%           
  summarise(
    av_like_ratio = mean(like_ratio),
    av_comments_ratio = mean(comments_ratio),
    av_shares_ratio = mean(shares_ratio),
    av_love_ratio = mean(love_ratio),
    av_wow_ratio = mean(wow_ratio),
    av_haha_ratio = mean(haha_ratio),
    av_sad_ratio = mean(sad_ratio),
    av_angry_ratio = mean(angry_ratio),
    av_care_ratio  = mean(care_ratio)) %>%
  ts_plot(title = " Comparision of weekly user interaction rates Over Time",
          Xtitle = "Time",
          Ytitle = "")

Comparision of monthly user interaction rates

vlc <- vice_data_cl %>%  
select(post_created, likes, comments, shares, love, wow, haha, sad, angry, care, total_interactions) %>%
group_by(post_created) %>%  
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"),
       like_ratio = likes/total_interactions,
       comments_ratio = comments/total_interactions,
       shares_ratio = shares/total_interactions,
       love_ratio = love/total_interactions,
       wow_ratio = wow/total_interactions,
       haha_ratio = haha/total_interactions,
       sad_ratio = sad/total_interactions,
       angry_ratio = angry/total_interactions,
       care_ratio = care/total_interactions) %>%
select(post_created, like_ratio, comments_ratio, shares_ratio, love_ratio, wow_ratio, haha_ratio, sad_ratio, angry_ratio, care_ratio, Start.Month)  

vlc %>%
  group_by(Start.Month) %>%           
  summarise(
    av_like_ratio = mean(like_ratio),
    av_comments_ratio = mean(comments_ratio),
    av_shares_ratio = mean(shares_ratio),
    av_love_ratio = mean(love_ratio),
    av_wow_ratio = mean(wow_ratio),
    av_haha_ratio = mean(haha_ratio),
    av_sad_ratio = mean(sad_ratio),
    av_angry_ratio = mean(angry_ratio),
    av_care_ratio  = mean(care_ratio)) %>%
  ts_plot(title = " Comparision of monthly user interaction rates Over Time",
          Xtitle = "Time",
          Ytitle = "")

Monthy effect of Angry Reaction in Video Performance

library(ggpubr)
vlc <- vice_data_cl %>%  
select(post_created, angry, total_interactions, likes_at_posting, followers_at_posting, total_views_for_all_crossposts) %>%
group_by(post_created) %>%  
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month"),
       angry_ratio = angry/total_interactions) %>%
select(post_created, angry_ratio, Start.Month,  total_interactions, likes_at_posting, followers_at_posting, total_views_for_all_crossposts)  

vlc <- vlc %>%
  group_by(Start.Month) %>%           
  summarise(
    av_angry_ratio = mean(angry_ratio),
    av_total_interactions  = mean(total_interactions),
    av_likes_at_posting  = mean(likes_at_posting),
    av_total_views_for_all_crossposts  = mean(total_views_for_all_crossposts) ) 


vlc1 <- vlc %>% select(Start.Month, av_angry_ratio)
vlc2 <- vlc %>% select(Start.Month, av_total_interactions)
vlc3 <- vlc %>% select(Start.Month, av_likes_at_posting)
vlc4 <- vlc %>% select(Start.Month, av_total_views_for_all_crossposts)

p1 <- ggplot(vlc1, aes(x=Start.Month, av_angry_ratio)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Angry Ratio by Month") +
  xlab("Time")


p2 <- ggplot(vlc2, aes(x=Start.Month, av_total_interactions)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Monthly Average of Total Interactions") +
  xlab("Time")

p3 <- ggplot(vlc3, aes(x=Start.Month, av_likes_at_posting)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Monthly Average of Likes at Posting") +
  xlab("Time")

p4 <- ggplot(vlc4, aes(x=Start.Month, av_total_views_for_all_crossposts)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Monthly Average of Total Crossposts") +
  xlab("Time")

ggarrange(p1,p2,p3,p4)

Weekly of Angry Reaction in Video Performance

library(ggpubr)
vlc <- vice_data_cl %>%  
select(post_created, angry, total_interactions, likes_at_posting, followers_at_posting, total_views_for_all_crossposts) %>%
group_by(post_created) %>%  
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week"),
       angry_ratio = angry/total_interactions) %>%
select(post_created, angry_ratio, Start.Week,  total_interactions, likes_at_posting, followers_at_posting, total_views_for_all_crossposts)  

vlc <- vlc %>%
  group_by(Start.Week) %>%           
  summarise(
    av_angry_ratio = mean(angry_ratio),
    av_total_interactions  = mean(total_interactions),
    av_likes_at_posting  = mean(likes_at_posting),
    av_total_views_for_all_crossposts  = mean(total_views_for_all_crossposts) ) 


vlc1 <- vlc %>% select(Start.Week, av_angry_ratio)
vlc2 <- vlc %>% select(Start.Week, av_total_interactions)
vlc3 <- vlc %>% select(Start.Week, av_likes_at_posting)
vlc4 <- vlc %>% select(Start.Week, av_total_views_for_all_crossposts)

p1 <- ggplot(vlc1, aes(x=Start.Week, av_angry_ratio)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Angry Ratio by Week") +
  xlab("Time")


p2 <- ggplot(vlc2, aes(x=Start.Week, av_total_interactions)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Weekly Average of Total Interactions") +
  xlab("Time")

p3 <- ggplot(vlc3, aes(x=Start.Week, av_likes_at_posting)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Weekly Average of Likes at Posting") +
  xlab("Time")

p4 <- ggplot(vlc4, aes(x=Start.Week, av_total_views_for_all_crossposts)) +
  geom_line( color="steelblue") + 
  geom_point() +
  xlab("") +
  theme_ipsum() +
  theme(axis.text.x=element_text(angle=60, hjust=1)) +
  ylab("Weekly Average of Total Crossposts") +
  xlab("Time")

ggarrange(p1,p2,p3,p4)

Social shares

One of main goals for video content should be social shares. This widens audience exponentially, increasing brand awareness and potentially bringing in new leads.

Video Share Status – owned vs crosspost

vlc<- vice_data_cl %>% 
  select(post_created, video_share_status) %>% 
  group_by(post_created, video_share_status) %>% 
  summarise(freq = n()) %>%
  spread(key=video_share_status, value=freq) %>%
  select(crosspost, owned, share)
ts_plot(vlc,         title = "Video Share Status Over Time",
        Xtitle = "Time",
        Ytitle = "Frequency")

Monthly Average Comparision of Video Share Status – owned vs crosspost

vlc<- vice_data_cl %>% 
  select(post_created, video_share_status) %>% 
  group_by(post_created, video_share_status) %>% 
  summarise(freq = n()) %>%
  spread(key=video_share_status, value=freq) %>%
  select(crosspost, owned, share) %>%
mutate(Start.Month = lubridate::floor_date(post_created, unit = "month")) %>%
  select(Start.Month, crosspost, owned, share)

vlc %>%
  group_by(Start.Month) %>%           
  summarise(
    crosspost_ratio = mean(crosspost),
    owned_ratio = mean(owned),
    share_ratio = mean(share)) %>%
  ts_plot(title = " Monthly Comparision of Video Share Status",
          Xtitle = "Time",
          Ytitle = "")

Weekly Average Comparision of Video Share Status – owned vs crosspost

vlc<- vice_data_cl %>% 
  select(post_created, video_share_status) %>% 
  group_by(post_created, video_share_status) %>% 
  summarise(freq = n()) %>%
  spread(key=video_share_status, value=freq) %>%
  select(crosspost, owned, share) %>%
mutate(Start.Week = lubridate::floor_date(post_created, unit = "week")) %>%
  select(Start.Week, crosspost, owned, share)

vlc %>%
  group_by(Start.Week) %>%           
  summarise(
    crosspost_ratio = mean(crosspost),
    owned_ratio = mean(owned),
    share_ratio = mean(share)) %>%
  ts_plot(title = " Monthly Comparision of Video Share Status",
          Xtitle = "Time",
          Ytitle = "")